Information causality from an entropic and a probabilistic perspective 
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The information causality principle is a generalisation of the no-signalling principle which implies 
some of the known restrictions on quantum correlations. But despite its clear physical motivation, 
information causality is formulated in terms of a rather specialised game and figure of merit. We 
explore different perspectives on information causality, discussing the probability of success as the 
figure of merit, a relation between information causality and the non-local 'inner-product game', 
and the derivation of a quadratic bound for these games. We then examine an entropic formulation 
of information causality with which one can obtain the same results, arguably in a simpler fashion. 
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I. INTRODUCTION 

Quantum theory has many strange properties, but per- 
haps the most surprising is that of non-locality. Some 
quantum states, known as entangled states, cannot be 
described by giving a separate quantum state for each 
system, or even by a probabilistic mixture of such states. 
This is not jnst an artifact of the mathematical formal- 
ism; many entangled states give rise to observable corre- 
lations which cannot be explained by any local model [l|- 
y]. However, an important caveat is that these non-local 
correlations cannot be used for superluminal signalling. 

Although this area has been extensively studied, we still 
don't have a good intuition about which non-local cor- 
relations are achievable in quantum theory, and what 
they can be used for. They are certainly helpful in 
some non-local tasks [J, Q , but it has been shown that 
even stronger correlations are possible without generat- 
ing superluminal signalling [6|. Furthermore, there have 
recently been a number of results describing non-local 
tasks for which quantum entanglement is not helpful at 
all, whilst stronger non-local correlations give an advan- 
tage [7H9|. By gaining a better understanding of quan- 
tum non-locality, we hope to hone our intuitions about 
its information-theoretic uses, and perhaps learn more 
about why nature is quantum. 

In this paper, we will discuss one particular non-local 
task for which quantum non-locality is not helpful (at 
least with the original figure of merit), known as infor- 
mation causality [8| . This is an appealing principle which 
one would reasonably expect to hold, and which quantum 
theory obeys, yet which can be violated using correlations 
slightly stronger than quantum theory permits [8|, |lO| . 

Information causality relates to a particular type of 
game: a bit string x of length n is chosen uniformly at 
random and given to Alice, whilst Bob is given a random 
number k, {1 < k < n). Alice may then send an m-bit 
message a to Bob, after which Bob must try to guess 
Xk, the fc'th bit of Alice's original bit-string. Bob's guess 
when his input is k is denoted /3k. The parties may de- 



cide on a joint strategy and may initially share correlated 
resources but play from separate locations. 
The information causality principle states that 



I = ^Ic{xk : Pk) < m, 



(1) 
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where Ic {X : Y) denotes the classical mutual information 
of variables X and Y [26(. The intuition behind this 
bound is that the total information that Bob can access 
about Alice's bits cannot exceed the size of the message 
she sent. Indeed, the inequality in ([T|) is saturated if 
Alice simply sends to Bob the first m bits of x, so that 
Ic{xk : /3fc) = 1 if 1 < A: < m, and otherwise. 

It is proven in [8| that information causality is obeyed 
in both the quantum and classical world. However, it 
can be violated in worlds governed by different physi- 
cal laws (such as 'box world' [lil, [i2]j which permits all 
non-signalling correlations). In what follows, we first dis- 
cuss probability of success in the information causality 
game. We then derive a bound which relates informa- 
tion causality to a different non-local game, in which Al- 
ice and Bob must compute the inner product of two bit 
strings. Finally, we will explore an alternative formu- 
lation and derivation of information causality based on 
entropy rather than mutual information. 



II. PROBABILITY OF SUCCESS FOR 
INFORMATION CAUSALITY 

Although quantum entanglement gives no advantage 
over a classical strategy in the information causality game 
when / (defined by H])) is the figure of merit, it is not true 
that every quantum strategy can be classically simulated. 
In fact, if probability of success is used as the figure of 
merit instead, it can easily be seen that entangled quan- 
tum states allow one to do better than in the classical 
world. For example, in a simple version of the game in 
which n = 2 and m ^ 1, the optimal classical probability 
of success is | (e.g. when Alice sends Bob a ^ Xi and 
he guesses /3fe = a, they always win when fc = 1 and 
win half the time when k — 2). However, by exploiting 
well-known quantum violations of Bell inequalities, Alice 
and Bob can achieve a success probability of ^ y ^ . To 



do this, Alice and Bob first generate bits a and h satis- 
fying a®b= (xi © X2)(k - 1) with probabihty ^^^, 
where © denotes addition modulo 2. This is equivalent 
to the quantum Tsirelson bound for the CHSH inequal- 
ity [3, [l3|- Then Alice sends Bob a — a® xi and Bob 
outputs /3fc = 6 © a d d. 

It is also possible to obtain very different values of / 
for strategies with the same probabilities of success. As 
above, Alice can send Bob her first bit to obtain 1 = 1 
and probability of success | ; alternatively, Alice and Bob 
can randomly "mix" this strategy with one where Alice 
sends Bob her second bit and he outputs it, so that the 
overall probability of success is the same but 



I = Ic{xi : Pi) + Ic{x2 ■■ 132) 
3 1 
. .4' 4 
0.38. 



= 2 1- H 



(2) 



Furthermore, it is clear that a small amount of noise 
added to the first strategy will do better than the sec- 
ond strategy in terms of /, but worse in terms of success 
probability, so these two figures of merit are not mono- 
tonically related. 

The optimal classical strategy to maximize the proba- 
bility of success in the case when m = 1 has already been 
derived for general n, in the context of Random-Access 
Coding |l5J. It is attained by using the "majority- vote" 
strategy, in which Alice simply sends Bob the bit that 
most frequently occurs in her string. This gives success 
probability 



P^ =-[l 

success rx ' 



1 /n-1 



(3) 



Using Stirling's approximation, one can derive the 
asymptotic behaviour of this probability: 



(4) 



We show in the next section that the optimal quantum 
probability of success for the same situation is 




pQ 



success 




(5) 



which is always strictly larger than the classical limit. 
This extends a result obtained in J16i | for particular n 

Interestingly, ([5]) is also the optimal success probability 
when Alice is allowed to send a qubit to Bob instead of a 
classical bit, but Alice and Bob do not share an entangled 
state [il,[i|]. 

The probability of success has a clean operational in- 
terpretation as a figure of merit: it is the asymptotic 
fraction of games one would expect to win over many in- 
dependent repetitions. Although it sounds appealing, the 



operational meaning of / is less natural. In particular, 
suppose Alice and Bob play the game many times, then 
Alice is told Bob's input k for each round, and she sends 
him some supplementary classical information which (to- 
gether with his guesses Pk) he must use to output the cor- 
rect value of Xk for each round. The average amount of 
supplementary information per round which Alice must 
send Bob is (1 — I /n). This follows from a result of ^^ 
that the asymptotic amount of information (using coding 
over many rounds) required to learn Xk given that you 
hold Pk is H{xk\l3k) = H{xk) - I{xk : Pk)- 

However, although / is a less natural a priori figure 
of merit than success probability, its appeal lies in the 
simplicity of the bound given by ^. In particular, the 
maximum value of / is the same for classical or quan- 
tum strategies, and can be simply stated for any message 
length m (by contrast, the maximum success probabili- 
ties given by ©-([I]) are complicated, depend on n and 
only apply when m = 1). 



III. INFORMATION CAUSALITY AND THE 
INNER PRODUCT GAME 

Given that the mutual information is a complicated non- 
linear function of the associated probabilities, it is sur- 
prising that the bound given by information causality can 
be used to derive the Tsirelson bound, which can be un- 
derstood as a bound on the quantum success probability 
for a particular non-local game [28J. Even more surpris- 
ingly, information causality can be used to generate part 
of the curved surface of the set of achievable quantum 
correlations [10|. 

To investigate this, we note that the proof of the 
Tsirelson bound given in [8| can be decomposed into sev- 
eral steps. The first is to prove that the information 
causality principle I < m implies a bound X]fc=i[l ~ 
h{Pk))] < w on the binary entropy [29[ of the success 
probability Pk given a particular input for Bob. This en- 
tropic bound can be transformed into a quadratic bound 
on the bias Ek = (2Pfc — 1) achieved in the game by not- 

ing that 1 — h{Pk) > „. ''„ . The information causality 
principle can therefore be used to generate the bound 



E 

k=l 



Ef. < 2m In 2 



(6) 



Finally, the authors consider a particular strategy for 
playing the game in which m = \ and n is a power of 
2, and show that the ability to generate correlations vio- 
lating the Tsirelson bound would allow one to violate ([6]) 
for sufficiently large n. Hence, given information causal- 
ity, the Tsirelson bound holds. 

As the quadratic bound given by ([SJ plays a key role in 
deriving the Tsirelson bound from information causality, 
it is interesting to investigate such bounds directly in 
quantum theory. To facilitate this, we first consider a 
seemingly unrelated non-local game, in which the aim is 



to produce the inner product of two bit strings. In this 
inner product game, Ahce and Bob are given uniformly 
random n-bit strings x and y respectively. Then without 
communicating, Alice and Bob must output bits a and 
b respectively such that a 6 — x • y, where x • y = 

Xiyi 0X22/2 . ■ . ffi 2;n2/n. 

The ability to win the inner product game perfectly 
would allow the parties to non-locally compute any func- 
tion of their inputs [iJI , and therefore to solve any com- 
munication complexity problem with only a single bit of 
communication. 

We can derive a bound on the inner product game which 
is very similar to ([6]). Assume that Alice and Bob share 
an initial entangled state jV'), and their outputs are ob- 
tained by measuring the operators Cx and by respectively 
(with eigenvalues 0,1). The bias they achieve in the game 
when they are given inputs x and y is 



^xy = (V^|(-l)^-+'"' 



^•''1^), 



where 



pQ 



22n Z^ ^^y I 

xy / 



(7) 



(8) 



Similarly, the average bias they achieve when Bob is given 
y and we average over Alice's input is given by Ey — 

2" ^x ^xy 

To derive a quadratic bound, we adopt a similar ap- 
proach to 7]. We define the normalised states 



1^) = iE(-i)'''l^)®l^) 



l^y) 



1 



^EM) 



by+x-y 



W ® |x) 



(9) 
(10) 



where the \By) states form an orthonormal set satisfying 
{By\By,) = 6yy\ 
It follows that 



E4 - E(^i^y) 



from (TTTI) . by taking 



pQ 

success 




(12) 



When n = 1, the inner- product game is equivalent to the 
CHSH game Q , and this bound on the success probabil- 
ity corresponds to the usual Tsirelson bound. 

The bound given by (fTTj) actually holds regardless of the 
probability distribution over Bob's input y. This general- 
isation allows us to derive a bound on a non-local version 
of the information causality game, in which Alice is given 
a random n-bit string x. Bob is given a random number 
k satisfying 1 < k < n, and they attempt to produce bits 
a and b such that a Q) b — Xk without communicating. 
If Bob's bit-string in the inner-product game is chosen 
at random from the set of bit-strings containing a sin- 
gle one (i.e. from the bit-strings of Hamming weight 1), 
with k denoting the position of the non-zero bit in y, 
then :x. ■ y — Xk- In this case, the inner-product game 
is the same as the non-local information causality game 
and (jlip gives 



E 



Et <1. 



(13) 



Note that this is a stronger bound than ([5]), which was 
obtained from information causality. Similarly, an anal- 
ogous derivation to ([T2|) gives 



pQ 



success 




(14) 



We now show that the bound given by (|T3|) can be sat- 
urated in quantum theory for any choice of -Efc. It was 
proved in [19] (and used in |1^) that for any set of real 
vectors Ux and Vy of at most unit length, we can find 
a quantum state {ip) of a bipartite system, and binary- 
valued operators Ox and 6x (which can be measured lo- 
cally on subsystems A and B), such that 



(^1 El^y)(^yl 1^) 



< 1 



(11) 



where in the last step we have used the fact that 
^ |i?y)(i?y| is a projector and \A) is normalised. A 
similar result was obtained independently by Pawlowski 
and Winter using a different method, and described very 
recently in [1^ |30{ . 
We can also obtain a bound on the probability of success 



(^|(_l)^. + ''.|V,)^U>y. 



(15) 



and hence 



E^y = i-ir-yulVy. (16) 



For any desired biases Ey satisfying ^ £'y < 1 we can 
consider the vectors 



Ux = E(-i)'"^y 



Vy = ey. 



(17) 
(18) 



where Cy denotes an orthonormal basis for a real vector 
space with dimension equal to the number of different 
inputs for Bob. This gives E^y — Ey, and hence we can 
achieve any set of biases satisfying pT|) . In particular, we 
could obtain an equal bias for all of Bob's possible inputs 
in the non-local information causality game {Ek = -7=) 
which would achieve the optimal probability of success 

Although these results apply to the non-local version 
of the information causality game, any strategy can be 
transferred to the original version of the game with 
m = 1, with the same probability of success. Alice sim- 
ply sends the message a = a to Bob, and he outputs 
/3 = a ® 6. This is not the only type of strategy which is 
possible in the original information causality game (e.g. 
Bob's measurement could depend on Alice's message). 
However, in [l^ an identical inequality to P^ is derived 
for the original game, hence the optimal strategy for the 
non-local version of the game is also optimal when trans- 
ferred to the original game. Note that the strategies used 
in to derive the Tsirelson bound, and to achieve per- 
fect success given arbitrary non-signalling resources, are 
also of this form. 

The bound ^ Ey < 1 for the inner product game 
seems to capture a great deal about the possible quan- 
tum correlations, yet note that this inequality can also be 
saturated by a classical strategy. In particular, if Alice 
and Bob output a = o; • x and & = 0, they will achieve a 
bias of 1 when y = a and in every other case. 



IV. INFORMATION CAUSALITY FROM 
ENTROPY 

Given the above, it appears that the particular math- 
ematical form of the mutual information is not central 
in defining the boundary of the set of quantum correla- 
tions (as the proof proceeds via a quadratic bound) , and 
the choice of / rather than probability of success as the 
figure of merit seems somewhat arbitrary. However, the 
fact that quantum theory obeys information causality ac- 
tually follows from the existence of a natural extension 
of the classical mutual information to quantum states. 
Can we focus on this as a defining property of quantum 
theory? 

In general probabilistic theories, the state of a system is 
characterised by a complete description of the probability 
of each measurement outcome, for any possible measure- 
ment on that system[ll|, [20, l21|. A specific probabilistic 
theory is defined by allowing certain types of systems, 
and certain states on those systems: for example, classi- 
cal theory consists of systems specified by a single proba- 
bility distribution (such as a ball in one of several boxes). 
In any such theory, it was shown in [8| that the informa- 
tion causality principle I < m will hold if an analogue 
of the mutual information I(X : Y) can be defined for 
all systems X and Y (which may be composite) with the 



following properties: 

(i) Consistency: Whenever X and Y are classical sys- 
tems, / reduces to the classical mutual information, 
I{X : Y) - /e(A : Y) 

(ii) Data Processing: Whenever a transformation is 
performed on Y alone, AI{X : Y) < 

(iii) Chain Rule: For all tripartite systems X, Y, Z 
I{X : YZ) - I{X : Z) = I{XZ : Y) - I{Z : Y) 

(iv) Symmetry: I{X : Y) = I{Y : X) 

(v) Non-negativity: I{X : F) > 0. 

It is well known that all of these properties are satisfied 
by Iq and Ic, the quantum and classical versions of the 
mutual information. The proof of information causality 
also assumes the validity of some natural operations, in 
particular the ability to discard a system, or to prepare 
a system in a state determined by the value of a clas- 
sical variable. These transformations can be defined for 
any theory in the general probabilistic framework of jljj . 
If we consider discarding both X and Y, we can actu- 
ally derive non-negativity from the symmetry and data- 
processing conditions, since (denoting a discarded system 
by 0) I{X : Y) > I{X : 0) = /(0 : X) > J(0 : 0) = 0, 
hence condition (v) can easily be eliminated [3lj. 

However, while the other properties seem intuitively 
reasonable, property (iii) seems like a strange demand. 
Furthermore, the fact that the mutual information nec- 
essarily concerns a pair of systems makes it a somewhat 
complicated quantity. 

In the remainder of this section, we show that the In- 
formation Causality principle follows more simply from 
the existence of 'good' measure of entropy in a general 
theory. In particular, the entropy only concerns a single 
system (although this may be composite), and is only 
required to obey two conditions. 

(I) Consistency If system X is classical, H{X) reduces 

to the classical entropy, H{X) — Hc{X). 

(II) Evolution with an ancilla For any two systems 

X and Y, whenever a transformation is performed 
on Y alone. 



AH{XY) > AH{Y) 



(19) 



Condition (/) says that H gives the asymptotic com- 
pression rate for classical data. Condition (//) can be 
understood intuitively as saying that a local transforma- 
tion can generate more uncertainty than its effect on an 
individual subsystem would suggest, as it can destroy 
but not create correlations. If we also define a condi- 
tional entropy analogously to the quantum and classi- 
cal quantity, as H{X\Y) = H{XY) - H{Y), we can al- 
ternatively re-express P^ as AH{X\Y) > 0. We can 
also express (//) symmetrically as the requirement that 



AH{XY) > AH{X) + AH{Y) under local transforma- 
tions on X and Y. 

Given an entropy function obeying the above conditions, 
we can define a mutual information analogously to the 
quantum and classical case as 



I{X : Y) = H{X) + H{Y) - H{XY). 



(20) 



This automatically ensures that conditions (iii) and (iv) 
arc satisfied, removing the awkwardness of having to pos- 
tulate the Chain Rule, and (i) and (ii) follow trivially 
from (/) and (//) respectively. 
The existence of an entropy function with properties 
(/) and {II) is therefore sufficient to derive information 
causality. Conversely, in any theory in which one can vio- 
late Tsirelson's bound, it must be impossible to define an 
entropy which satisfies assumptions (/) and (//). Several 
entropies which can be applied to any probabilistic the- 
ory, and which always obey (/), have been proposed in 
[22 - [24| . A different set of entropic conditions which can 
be used to derive information causality were discussed in 

It's not hard to deduce some other standard properties 
of the entropy from conditions (/) and (//) : 

Subadditivity By discarding Y, we find from (//) that 
the entropy is subadditive: 



H{XY) < H{X) + H(Y). 



(21) 



When X and Y are independent systems, we 
can also prepare Y locally, which implies that 
H{X, Y) = H{X) + H{Y) in this case. 

Strong Subadditivity By discarding Z from the com- 
posite Y Z in the tripartite system XY Z , we obtain 
strong subadditivity: 

H{XYZ) + H{Y)<H{XY) + H{YZ). (22) 

This inequality is equivalent to subadditivity of the 
conditional entropy. It can also be iterated for a 
larger number of systems to give: 



correlated with (and identical to) X. Before the 
transformation the conditional entropy is given by 
H{X\X), and so is non-negative by (/). Then by 
(/J), after the transformation II{X\Y) must also 
be non-negative. 

Information causality can be proven from the existence 
of an entropy satisfying (/) and (//) by first construct- 
ing the mutual information and then applying the proof 
of 8]. However, it can also be proved more directly us- 
ing the properties of the entropy derived above, and this 
yields a slight generalisation of the information causality 
principle. 

Bob's guess Pk is derived solely from Alice's message a. 
and Bob's system B before that message is sent. Thus 
whatever the strategy, there is a transformation from 
(a, B) to Pk for each k: 

J2Hc{xk\/3k)>J2Hixk\a,B) 

k k 

> H{x\a,B) 

= H{x,a,B)-H(a,B) 

> i/(x, a, B) - H{B) - H{a) 

= if (x, a, B) - H{x, B) + iJ(x) - H{a) 
= H{a\x, B) + iJ(x) - H{a) 

> i/e(x) - H,{a) (25) 

This is a generalized form of the information causality 
principle which makes no assumption on the distribution 
on Alice's input x. It can be interpreted as saying that 
the remaining uncertainty that Bob has about Alice's bits 
after guessing must be more than the original uncertainty 
about her inputs minus the information gained by the 
message. In the special case in which Alice's inputs are 
independent, Hcipi) = X]fe Hc{xk), and we can rearrange 
([25t to get 



J2l{xk ■■ Pk) < H,{cx) < m. 



(26) 



H{Xi...Xn\Y)<H{Xi\Y) 



HiXn\Y). (23) 



Positivity of Classical Entropy Uncertainty about 
the state of a classical system X can never be 
negative, even when one conditions on an arbitrary 
system Y . 



System X is classical ^ H{X\Y) > 



(24) 



We argue this last result in the following way: the 
state of X is described by a probability distribu- 
tion on a finite set E of outcomes, and for each 
outcome e Cz E there is a corresponding reduced 
state cr^l"^ of Y. We can therefore obtain the joint 
state of system XY by a local transformation on 
Y from a classical system that is initially perfectly 



V. CONCLUSIONS 

Considering probability of success in the information 
causality game, we see that quantum theory gives an ad- 
vantage which is not captured by the figure of merit / 
which is bounded by ([T]). Investigating how these prob- 
abilities are involved in deriving Tsirelson's bound from 
information causality j8|| leads us to a quadratic quantum 
bound 



E4 



< 1 



(27) 



on the biases achieved given different inputs for Bob in 
the non-local inner product game. This applies for an 



arbitrary distribution over Bob's inputs, and hence to 
the non-local version of the information causality game. 
This is another example of a bound which quantum and 
classical correlations can both saturate, but stronger non- 
local correlations can violate. Furthermore, the fact that 
quantum correlations allow one to achieve any set of bi- 
ases satisfying this rule means that it captures a signifi- 
cant amount about the set of quantum correlations. Can 
we construct useful quadratic bounds on quantum per- 
formance in other nonlocal tasks? 

Instead of considering information causality as a con- 
straint on possible physical theories, it may be helpful to 
think of it as a consequence of the existence of a 'good' 
measure of entropy in the theory. Indeed, we have shown 
that information causality can be derived given any ex- 
tension of the entropy from classical to more general sys- 
tems which satisfies \h{XY) > AH{X) + AH{Y) un- 
der local transformations. Conversely, any theory which 
violates information causality (such as box world) can- 
not have an entropy defined in it which obeys the above 
evolution law and agrees with the Shannon entropy for 
classical systems. 



Given the above results, as well as those of |22| - |24| . it 
seems that the existence of a 'good' entropy for quantum 
theory, which shares so many of the properties of the 
classical entropy, is very special within the class of gen- 
eral probabilistic theories. Are there other theories for 
which one can define an entropy satisfying (/) and (//), 
or is this a defining feature of quantum theory [32|]? The 
existence of such an entropy potentially places stronger 
bounds on quantum theory than information causality 
alone. It would be interesting to look for other games 
where quantum theory can do no better than classical 
when such an entropy exists. 

Additional note: Very recently, similar results to those 
in section IIVI have been obtained independently in |25| 
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